Adaptive Estimation of a Distribution Function 
and its Density in Sup- Norm Loss by 
Wavelet and Spline Projections 

EVARIST GlNE AND RICHARD NlCKL 

University of Connecticut 
May 2008 

Abstract 

Given an i.i.d. sample from a distribution F on K with uniformly continuous density po, 
purely-data driven estimators are constructed that efficiently estimate F in sup-norm loss, 
and simultaneously estimate po at the best possible rate of convergence over Holder balls, also 
in sup-norm loss. The estimators are obtained from applying a model selection procedure 
close to Lepski's method with random thresholds to projections of the empirical measure 
onto spaces spanned by wavelets or B-splines. Explicit constants in the asymptotic risk of 
the estimator are obtained, as well as oracle-type inequalities in sup-norm loss. The random 
thresholds are based on suprema of Rademacher processes indexed by wavelet or spline pro- 
jection kernels. This requires Bernstein-analogues of the inequalities in Koltchinskii (2006) 
for the deviation of suprema of empirical processes from their Rademacher symmetrizations. 

MSG 2000 subject classification: Primary: 62G07; Secondary: 60F05. 

Key words and phrases: adaptive estimation, Rademacher processes, sup-norm loss, 
wavelet estimator, spline estimator, oracle inequality, Lepski's method. 

1 Introduction 

If X\, X n are i.i.d. with unknown distribution function F on R, then classical results of math- 
ematical statistics establish optimality of the empirical distribution function F n as an estimator 
of F . That is to say, if we assume no apriori knowledge whatsoever on F, and equip the set 
of all probability distribution functions with some natural loss function, such as sup-norm loss, 
then F n is asymptotically sharp minimax for estimating F. (The same is true even if more is 
known about F, for instance if F is known to have a uniformly continuous density.) However, 
this does not preclude the existence of other estimators that are also asymptotically minimax 
for estimating F in sup-norm loss, but which improve upon F n in other respects. In particular, 
if one believes that F is absolutely continuous then one may want to simultaneously obtain a 
reasonable estimate of the density of F. What we have in mind as a 'reasonable estimate' of the 
density of F is a purely data-driven adaptive estimator that achieves best rates of convergence 
in some relevant loss-function over some prescribed classes of densities. 

Our goal in the present article is to construct density estimators that satisfiy the two prop- 
erties just described, more concretely, the functional central limit theorem (CLT) for the dis- 
tribution function and adaptation in sup-norm loss to the unknown smoothness of the density, 
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assuming at most uniform continuity for the density of F, and involving reasonable constants in 
both the risk bounds and the estimation procedure. 

To achieve adaptation one can opt for several approaches, all of which are related. Among 
them we mention the penalization method of Barron, Birge and Massart (1999), wavelet thresh- 
holding (Donoho, Johnstone, Kerkyacharian and Picard (1996)), and Lepski's (1991) method. 
Our choice for the goal at hand consists of using Lepski's method, with random thresholds, 
applied to wavelet and spline projection estimators of a density. 

The linear estimators underlying our procedure are projections of the empirical measure onto 
spaces spanned by wavelets, and wavelet theory is central to some of the derivations of this 
article. The wavelets most commonly used in statistics are those that are compactly supported 
(e.g., Daubechies' wavelets), and our results readily apply to these. However, for computational 
and other purposes, projections onto spline spaces are also interesting candidates for the esti- 
mators. Density estimators obtained from projecting the empirical measure onto Schoenberg 
spaces spanned by B-splines were studied by Huang and Studden (1993). As is well-known in 
wavelet theory, the Schoenberg spline spaces with equally spaced knots have an orthonormal 
basis consisting of the Battle-Lemarie wavelets, so that the spline projection estimator is in fact 
exactly equal to the wavelet estimator based on Battle-Lemarie wavelets. These wavelets do not 
have compact support but they enjoy exponential decay at infinity. Although we cannot han- 
dle in general exponentially decaying wavelets, we can still work with Battle-Lemarie wavelets 
because the B-spline expansion of the projections allows us to show that the relevant classes of 
functions are of Vapnik-Cervonenkis type, so that empirical process techniques can be applied. 
In particular, the adaptive estimators we devise in Theorem[3]may be based either on spline pro- 
jections or on compactly supported wavelets. And in the process of proving the main theorem, 
we also provide new asymptotic results for spline projection density estimators similar to those 
for wavelet estimators in Gine and Nickl (2007). 

We need to use Talagrand's inequality with sharp constants (Bousquet (2003), Klein and 
Rio (2005)) in the proofs, but to do this, we have to estimate the expectation of suprema of 
certain empirical processes that appear in the centering of Talagrand's inequality. The use of 
entropy-based moment inequalities for empirical processes typically results in too conservative 
constants (e.g., in Gine and Nickl (2008)). In order to remedy this problem, we adapt recent 
ideas due to Koltchinskii (2001, 2006) and Bartlett, Boucheron and Lugosi (2002) to density 
estimation: the entropy based moment bounds are replaced by the sup norm of the associated 
Rademacher averages, which are, with high probability, better estimates of the expected value of 
the supremum of the empirical process. We derive a Bernstein-type analogue of an exponential 
inequality in Koltchinskii (2006) that shows how the supremum of an empirical process deviates 
from the supremum of the associated Rademacher processes. This Bernstein-type version allows 
to use partial knowledge on the variance of the empirical processes involved, which is crucial for 
applications in our context of adaptive density estimation. Moreover, we show that one can use, 
instead of the supremum of the Rademacher process, its conditional expectation given the data 
(which is more stable) . We should also remark on recent interest in obtaining inequalities similar 
to those in Koltchinskii (2006) for general bootstrap procedures, see Fromont (2007). Since many 
bootstrap empirical processes (such as the one's obtained from Efron's bootstrap) are minorized 
by Rademacher processes, our inequalities apply there as well, but may give suboptimal constants. 

Adaptive estimation in sup-norm loss is a relatively recent subject, and we should mention 
first the results in Tsybakov (1998) and Golubev, Lepski and Levit (2001) that were obtained 
in the Gaussian white noise model. Tsybakov (1998) devises procedures that are sharp adaptive 
(attaining the optimal constant in the asymptotic risk) in sup-norm loss, when the unknown 
function lies in a Sobolev ball of order (3 (so that the rate of convergence is no better than it 
would be for functions that are Holder-continuous of order (3 — 1/2). If - as Tsybakov (1998) 
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does - one uses Fourier-expansions in the white noise framework, the Gaussian processes in- 
volved turn out to be stationary, so direct methods (e.g., the Rice- formula) can be used to 
control suprema of the relevant random quantities. These methods extend to somewhat more 
general basis expansions, in which case the corresponding Gaussian processes are mildly non- 
stationary, see Lemma 1 in Golubev, Lepski and Levit (2001). However, if one is interested in 
adapting to a Holder-continuous density in sup-norm loss in the i.i.d. density model on R, this 
greatly simplifying structure is not available, in particular, trigonometric basis expansions are 
not optimal for approximating Holder- continuous functions in the supremum-norm. Working 
with the more adequate wavelet bases seems to require different methods; in this case, even after 
reduction to a Gaussian model by strong approximation, the resulting Gaussian processes cannot 
be dealt with as in the aforementioned articles, and this for several reasons, including lack of 
stationarity, non-differentiability of the covariance function (for many wavelets), and since we 
are interested in suprema over the whole real line. We show that empirical process techniques 
can be used in the i.i.d. density model to achieve rate-adaptive estimators, and we establish 
reasonable bounds for the asymptotic constant in sup-norm risk. The constants we obtain here 
are not sharp (as compared to the optimal ones obtained in Korostelev and Nussbaum (1999) 
for densities supported in [0, 1]): This does not come as a surprise since a) at least some loss has 
to expected for adaptive procedures over Holder classes, cf. Lepski (1992) and Tsybakov (1998), 
and b) Rademacher symmetrization increases the constants in the large deviation bounds we use. 

In the i.i.d. density model, a direct 'competitor' to the estimators constructed in this article 
is the hard thresholding wavelet density estimator introduced in Donoho et al. (1996): as proved 
in Gine and Nickl (2007), its distribution function satisfies the functional CLT and it is adaptive 
in the sup-norm over Holder balls; however, the proofs there require the additional assumption 
that dF integrates \x\ s for some <5 > 0, and the constants appearing in the threshold and the 
risk become quite large for 5 small. The results in the present article hold under no moment 
condition whatsoever. Gine and Nickl (2008) construct another estimator that asymptotically 
optimally estimates F and its density in sup-norm loss. The estimator there was constructed 
by applying Lepski's method to classical kernel estimators, modified by imposing 'by force' 
that their distribution functions stay at a uniform distance o(l/y/n) from F n . In the present 
situation, if F has a uniformly continuous density, we do not need to force the estimator to stay 
in a o{l/y/n) ball around F n , which reduces the complexity of the method, and we also avoid 
the large constants that resulted from the entropy bounds in Gine and Nickl (2008). 

There has been recent interest in considering nonasymptotic risk bounds for adaptive estima- 
tors. Rigollet (2006) obtained sharp oracle inequalities (with monotone oracles) in L 2 (R)-loss, 
using a Stein-type density estimator. He builds on results of Cavalier and Tsybakov (2001) that 
were obtained in the Gaussian white noise model, and the methods employed there are closely 
tied to Hilbert-space structure. We prove an oracle inequality in sup-norm loss for estimators 
based on Haar wavelets, but with the following constraints: First, the constant we obtain cannot 
be made arbitrarily close to one, and second, we have to assume that the true density has at 
least one point where it attains a critical Holder singularity. The latter condition is related to 
the notion of self-similar functions in Jaffard (1997a, b), and also arises in related problems such 
as the construction of pointwise adaptive confidence intervals, cf. Picard and Triboulcy (2000). 

The outline of the article is as follows: In Section 2 we define the basic linear estimators and 
give some of their asymptotic properties. In Section 3, building on Talagrand's inequality, we 
derive a Bernstein-type inequality for the deviation of the (supremum of the) empirical process 
from the (supremum of the) associated Rademacher process. In Section 4, we construct the 
adaptive procedures and give the main results. Most of the proofs are deferred to Section 5. 
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2 Wavelets expansions and estimators 



We start with some basic notation. If (S, S) is a measurable space, and for Borel-measurable 
functions h : S — > R and Borel measures \i on S, we set fih := J s hdp,. We will denote by 
L P {Q) := L P (S, Q), 1 < p < oo, the usual Lebesgue spaces on S w.r.t. a Borcl measure Q, and if 
Q is Lebesgue measure on S = R we simply denote this space by L P (R), and its norm by || • \\ p 
if p < oo. We will use the symbol ||ft-|joo to denote sup^gR \h(x)\ for h : R — > R. For s G N, 
denote by C S (R) the spaces of functions / : R — > R that are s-times differentiable with uniformly 
continuous D s f, equipped with the norm ||/|| Sj0 o = J2o< a <s ll^"/lloo i with the convention that 
D° =: id and that then C(R) := C°(R) is the space of bounded uniformly continuous functions. 
For noninteger s > and [s] the integer part of s, set 

!| nH f( r \ — nM f( v ) I 
/eCW(R):||/|| 5i00 := £ p"/^ + sup J ' 1 J < ex. 

0<«<[s] ^ 1^ — 2/1 

We also define the 'local' Holder constant 

H(s,f) := supsup fe+f^M , (1) 

for < s < 1 and we set := ||£»/||oo- 

2.1 Multiresolution analysis and wavelet bases 

We recall here a few well-known facts about wavelet expansions, sec, e.g., Hardle, Kcrkyacharian, 
Picard and Tsybakov (HKPT, 1998). Let <j> G L 2 (R) be a father wavelet, that is, is such 
that {</>(■ — k) : k G Z} is an orthonormal system in L 2 (R), and moreover the linear spaces 
V = lf(x) = E fe c fc ^ - k) : {c k } kez e £ 2 }, V x = {h(x) = f(2x) : / e V },...,^ = {h{x) = 
f{^x) : f G Vo},..., are nested (V}_i C V, for j G N) and their union is dense in i 2 (R). In the 
case where is a bounded function that decays exponentially at infinity (i.e. \4>(x)\ < Ce~ 7 ! x 
for some C, 7 > 0) - which we assume for the rest of this subsection - the kernel of the projection 
onto the space Vj has certain properties: First, the series 

K(y, x) := K(4>, y, x) =^<f>{y- k)<f>(x - k), (2) 
fcez 

converges pointwise, and we set Kj(y, x) := 2 3 K(2 J y, 2 : >x) 1 j G N U {0}. Furthermore we have 

\K(y,x)\ < §(\y-x\) and sup V \<j>(x - k)\ < 00, (3) 

x m k 

where $ : R — > R + is bounded and has exponential decay. If / G L P (R), 1 < p < 00, and j is 
fixed, then the projection of / onto Vj is 

K 3 (f)(y) ■= I K j (x,y)f(x)dx = 2i <P( 2j y - k ) I <K*x - k)f{x)dx, yeR, 

J feGZ 

the series converging pointwise. For / G i x (R), which is the main case in this article, the 
convergence of the series in fact takes place in i p (R), 1 < p < 00. This still holds true if f(x)dx 
is replaced by dfi(x), where it is any finite signed measure. If now <f> is a father wavelet and ip 
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the associated mother wavelet so that {</>(• — k), 2'/ 2 ?/>(2'(-) —k) : k G Z, I G N} is an orthonormal 
basis of L 2 (R), then any / S L P (M) has the formal expansion 

oc 

/(y) = ^ a fc (/M» - fc) + E A* (4) 

where ^fc(y) = 2 l / 2 ip(2 l y - k), a k (f) = J f(x)(f>(x - k)dx, Pi k (f) = J f(x)tpi k (x)dx. Since 
(Ki + i — Kijf = J^k Pikif^ik, the partial sums of the series ^ are in fact given by 

— , j ~ 1 

Kj(f)(v) = E - fc ) +EE&(/Mtfe), (5) 

fc Z=0 fe 

and, if V' are bounded and have exponential decay, then |5j holds pointwise, and it also holds 
in L P (R), 1 < p < oo, if / € L 1 (R) or if / is replaced by a finite signed measure. Now, using 
these facts one can furthermore show that the wavelet series ((!]) converges in L P (M.), p < oo, for 
/ G L P (R), and we also note that if po is a uniformly continuous density, then its wavelet series 
converges uniformly. 

2.2 Density Estimation using wavelet and spline projection kernels 

Let X%, ...,X n be i.i.d. random variables with common law P and density po on R, and denote 
by P n = — X)2=i $x ( the associated empirical measure. A natural first step is to estimate the 
projection Kj(po) of po onto Vj by 

^ n j—i 

My) -=Pn(v,j) = -E^(y- X i) = E afe ^- fc ) + EE^^ fc (y) ^ eK ' ( 6 ) 

where if is as in @, j G N, and where a k — J 4>{x — k)dP n (x), 0i k = J ipik(x)dP n (x) are 
the empirical wavelet coefficients. We note that for </>, ip compactly supported (e.g., Daubechies' 
wavelets), there are only finitely many fc's for which these coefficients are nonzero. This estimator 
was first studied by Kerkyacharian and Picard (1992) for compactly supported wavelets. 

If the wavelets <p and ip do not have compact support, it may be impossible to compute the 
estimator exactly, since the sums over k consist of infinitely many summands. However, in the 
special case of the Battle-Lemarie family (p r ,r > 1 (see, e.g., Section 6.1 in HKPT (1998)) - 
which is a class of non-compactly supported but exponentially decaying wavelets - the estimator 
has a simple form in terms of splines: the associated spaces V^ r = {^2 k c/ c 2 J / 2 (/) r .(2 J (•) — k) : 

c 2 < oo} are in fact equal to the Schoenberg spaces generated by the Riesz-basis of i?-splines 
of order r, so that the sum in ([B]) can be computed by 

1 n O'i ™ 

i—1 i—1 k I 

where the Nj^r (are suitably translated and dilated) B-splines of order r, the kernel k is as in 
([34]l below and the b k i's are the entries of the inverse of the matrix defined in (|33|) below. An 
exact derivation of this spline projection, their wavelet representation and detailed definitions 
are given in Section 15.11 It turns out that for every sample point Xi and for every y, each 
of the last two sums extends only over r terms. We should note that this 'spline projection' 
estimator was first studied (outside of the wavelet setting) by Huang and Studden (1993), who 
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derived pointwise rates of convergence. See also Huang (1999), where some comparison between 
Daubechies' and spline wavelets can be found. 

In the course of proving the main theorem of this article, we will derive some basic results 
for the linear spline projection estimator ([7|), that we now state. For classical kernel estimators, 
results similar to those that follow were obtained in Gine and Guillou (2002) and Gine and Nickl 
(2008), and for wavelet estimators based on compactly supported wavelets, this was done in Gine 
and Nickl (2007). 

Theorem 1 Suppose that P has a bounded density pq. Assume j n — > oo ; n/(J n 2 Jn ) — > oo, 
j n / log log n — > oo and ji n — j n < t for some t positive. Let p n (y) = Pn(y,jn) be the estimator 
from (7p for some r > 1 . Then 



and, for 1 < p < oo. 



7i 

lim sup x j — — sup \p n (y) - Ep n (y)\ =C a.s. 

1 Jn j/eR 



1/p 



™pJ—-[Esup\p n (y)-Ep n (y)\P) < C 

n V 1 Jn \ yeR / 

where C and C depend only on ||j5o||oo and on r,p,r. Furthermore, if po S C*(M), with t < r, 
one has 



sup\p n (y)-p (y)\ = i \ +0{2- 1 ^) both a.s. and in L P (P), 

and, if in addition 2- y " ~ (n/ logn) 1 ^ 2 ^ 1 ^, then 

' , x t/(2t+l) x 

log n x 



sup|p rl (y) — Po(y)\ =0(1 ) both a.s. and in L P (P). 

yem \\ n J J 

For the following central limit theorem, we denote by ( R ) convergence in law for sample- 
bounded processes in the Banach space of bounded functions on R, and by Gp the usual P- 
Brownian bridge. See, e.g., Chapter 3 in Dudley (1999). 

Theorem 2 Assume that the density pg of P is a bounded function (t = 0) or that po G C*(R) 
for some t, < t < r. Let j n satisfy n/{2 3n j n ) — > oo and ^/n2~ 37l ^ t+1 ^ — > as n — > oo. Lf F is 
the distribution function of P and setting F^(s) :— J oo p(y,j n )dy, then 

V™(En — F) ~~>£~(R) Gp. 

Proof. Given e > 0, apply Proposition [5] below with A = e, so that \\F^ — .F n ||oo = op(l/yfn) 
follows, and use the fact that y / n(F„ — F) converges in law in ^°°(IR) to Gp. ■ 

We should emphasize that the optimal bandwidth choice 2~ 3n ~ n~ 1 / 2 *+ 1 (or, if sup-norm 
loss is considered, n replaced by n/logn) is admissible for every t > in the last theorem. 
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3 Estimating Suprema of Empirical Processes 



Talagrand's (1996) exponential inequality for empirical processes (see also Ledoux (2001)), which 
is a uniform Prohorov type inequality, is not specific about constants. Constants in its Bernstein 
type version have been specified by several authors (Massart (2000), Bousquet (2003) and Klein 
and Rio (2005)). Let be the coordinates of the product probability space (S, S, P) N , where 
P is any probability measure on (S,S), and let T be a countable class of measurable functions 
on S that take values in [—1/2, 1/2], or, if J 7 is P-centered, in [—1, 1]. Let a < 1/2 and V be any 
two numbers satisfying 



a 2 > 



\Pf\\r, V>na 2 + 2E 



!=1 



(8) 



T 



where V is also an upper bound for E ||XX/(-^i) — P/) 2 ||jr (Klein and Rio (2005)). Then, taking 
into account that supjg^y^jr, Yl7=i f — sup^ E™=i f(Xi)\, Bousquet's (2003) version of 
Talagrand's inequality is as follows: For every t > 0, 



Pr 



£(/pQ)-P/) 



> E 



^2(f(Xi)-pf) 



t>< exp — 



T 



In the other direction, the Klein and Rio (2005) result is: For every t > 0, 



Pr- 



£(/pQ)-P/) 



< E 



J2(f(x i )-pf) 



1 = 1 



t > < exp 



2V+jt 



2V + 2t 



(9) 



(10) 



These inequalities can be applied in conjunction with an estimate of the expected value 
obtained via empirical processes methods. Here we describe one such result for VC type classes, 
i.e., for T satisfying the uniform metric entropy condition 



sup N(F,L 2 (Q),t) < ^ ) . t •_ 1. ( , I -r ( . r j 2|. 



(11) 



with the supremum extending over all Borel probability measures on (S, S) . [We denote here by 
N(Q,L 2 (Q),t) the usual covering numbers of a class Q of functions by balls of radius less than 
or equal to r in L 2 (Q)-distance.] Then one has, for every n 



E 



< 2 



5A 5A 

15 W 2vna 2 log h 1350w log — 

a a 



(12) 



see Proposition 3 in Ginc and Nickl (2008) with a change obtained by using V as in © instead 
of an earlier bound due to Talagrand for E — P/) 2 ||j,r- This type of inequalities has 

also some history (Talagrand (1994), Einmahl and Mason (2000), Gine and Guillou (2001), Gine 
and Koltchinskii (2006), among others). The constants at the right hand side of (fT2"f may be far 
from best possible, but we prefer them over unspecified 'universal' constants. 

As is the case of Bernstein's inequality in R, Talagrand's inequality is especially useful in the 
Gaussian tail range, and, combining (J9j> and (|12p . one can obtain such a 'Gaussian tail' bound 
for the supremum of the empirical process that depends only on a (similar to a bound in Gine 
and Guillou (2001)). 
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Proposition 1 Let J- be a countable class of measurable functions that satisfies ill]) , and is 
uniformly bounded (in absolute value) by 1/2. Assume further that for some A > 0. 



, \ 2 v , 5A 
no- > — log — . 
z a 

Set ci(A) = 2[15 + 1350A" 1 ] and let c 2 (A) > 1 + 120A" 1 + 10800A~ 2 . Then, if 



5A 3 
ci{\) \j 2vna 2 log — <t < -C2(A)ncr 2 , 



Pr- 



Y,u^)-pf) 



>2t> < exp 



3c2(A)ner 2 



Proof. Under (fT3|) . inequality (fT2|) gives 



i=l 



< a(X)\ 2vna 2 log 



5.4 



X)(/(Xi)-p/) 



< f < 3F/2, 



© becomes 



Pr 



X)(/(X<)-P/) 



> 2t > < exp - 



3U 



(13) 

(14) 
(15) 



and ([8|) implies that we can take V = C2(X)na 2 . Now the result follows from (J9j> , taking into 
account that in the range of t's 



The constants here may be too large for some applications, but they are not so in situations 
where A can be taken very large, in particular in asymptotic considerations. [Then ci(A) — > 30 
and 02(A) — > 1 as A — > 00.] 



3.1 Estimating the size of empirical processes by Rademacher averages 

The constants one could obtain from Proposition [1] are not satisfactory for certain applications 
in adaptive estimation. We now propose a remedy for this problem, inspired by a nice idea of 
Koltchinskii (2001) and Bartlett, Boucheron and Lugosi (2002) consisting in replacing the expec- 
tation of the suprcmum of an empirical process by the supremum of the associated Rademacher 
process. To be more precise, these authors obtain a purely data driven stochastic estimate of 
the supremum of an empirical process and apply it to problems in risk minimization and model 
selection. An inequality of this type (see Koltchinskii (2006), page 2602), is 



Pr- 



i.=i 



> 2 



i=l 



T 



(16) 



where £j, i € N, are i.i.d. Rademacher random variables, independent of the X^s, all defined 
as coordinates on a large product probability space. Note that this bound does not take the 
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variance V in © into account, but in the applications to density estimation that we have in 
mind, V is much smaller than n (it is of the order n2~ Jn , j n — > oo). We need a similar inequality, 
with the quantity n in the bound replaced by V , valid in a large enough range of Vs. 

It will be convenient to use the following well-known symmetrization inequality (e.g., Dudley 
(1999), p.343): 



\\\Pf\\r<E 



J2(f(x*)-pf) 



i=l 



< 2E 



(17) 



T 



The following exponential bound is the Bernstein- type analogue of (jTHJ) ■ Denote by E e expecta- 
tion w.r.t. the Rademacher variables only. 

Proposition 2 Let J- be a countable class of measurable functions, uniformly bounded (in ab- 
solute value) by 1/2. Then, for every t > 0. 



Pr- 



£(/pQ) - Ef{X)) 



> 2 



E •/ ,A '< 



3t> < 2 exp 



t 1 



2V + 2t 



as well as 



Pr 



J2(f(X t )-Ef(X)) 



> 2E e 



E e *w) 



3t} < 2 exp -— — — 
f - y \ 2V' + 2t 



(18) 



(19) 



where V = na 2 + AE ||£™ =1 £l /(X 4 )||^ . 
Proof. We have 

Pr{ £(/(*<) -P/) >2 

n 

n 

i=l jf 

For the first term combining (fT7|) with ([!]) gives 



i=l 



> 2E 



T 



Pr 



< £ 



n 

»=i 
E £ *w 



3< 



+ f 



J 7 



i=l 



Pr 



i=l 



> 2E 



»=i 



t > < exp 



2V + (2/3)i 



For the second term, note that (fT0| applies to the randomized sums X)T=i £ if(Xi) as well by just 
taking the class of functions 

g = {g(T,x)=Tf(x):fET}, 

t 6 { — 1,1}, instead of and the probability measure P = 2~ 1 (<5_i + <5i ) x P instead of P. 
Hence 

t 2 



Pr- 



i=l 



< E 



E £ */(^ 



i > < exp 



2V + 2t 



(20) 
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since V > na 2 + 2E |E"=i e «/(^"')ll Combining the bounds completes the proof of (fT5|) . 

It remains to prove (fHJ)) . Let 5, P be as above, let Yi = (ej, JQ), and note that P is the law 
of l 7 ;. By convexity, 

£ e -*-E e |l£? =1 £i/Pfi)lb < £ e -t|l£? =1 E*/(*«)II.F = ^ e -i|IEr=ia( y i)llB 

for all t. The Klein and Rio (2005) version (fTTH) of Talagrand's inequality is in fact established by 
estimating the Laplace transform Ee~^ ^"=1 9( Y *)llo j anc [ Theorem 1.2a in Klein and Rio (2005) 
implies 

Ee -tE'-\\T.^U{x,)-Pf)\\, < -t^H^^Ht; + f (e 3t - 3t + l) , 

for V > na 2 + 2E\\ Y^=i dOtyWoi which, by their proof of the implication (a) => (c) in that 
theorem, gives 



Pr <^ E e 



< E 



T 



!>/(*<) 



— i > < exp 

f - v \ 2V + 2t 



The proof of (TK)|) now follows as in the previous case. 



For T of VC type, the moment bound (I12|) is usually proved as a consequence of a bound for 
the Rademacher process. In fact, the proof of Proposition 3 in Gine and Nickl (2008) shows 



E 



5A 5A 
< l5\/2vna 2 log — + 1350wlog — , 
a a 



(21) 



where a is as in ©, which we use in the following corollary, together with the previous propo- 
sition. The constant 02(A) in the exponent below is still potentially large, but tends to one if 
A — * 00. 

Corollary 1 Let T be a countable class of measurable functions that satisfies ill}) , and assume 
it to be uniformly bounded (in absolute value) by 1/2. Assume further \13\) for some A > 0. 
Then for 

0<t< ^c 2 {X)na 2 
with 02(A) as in Proposition^ we have 



Pr- 



> 2 



T 



E £ */(^ 



3t> < 2 exp 



2.1c 2 (A)ncr 2 / ' 



and the same inequality holds if ||X)"=i £ if{^i)\\jr * s replaced by its E e expectation. 

Proof. By (fT3|) and (|2"Tj) . we have V < C2(A)ner 2 , and the condition on t together with (fl8|l give 
the result. ■ 
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4 The adaptive estimation procedures 



We will now show how these findings apply to adaptive estimation of a distribution function and 
its density in sup-norm loss. The Rademacher processes we will consider are the following: First, 
generate a Rademacher sequence e%, i = 1, n, independent of the sample, and set, for j < I, 



R(n,j) 



1 

i=l 



and T(n, j, I) 



1 

1=1 



(22) 



where ifj is the kernel of the wavelet projection 7Tj onto Vj (both for Battle-Lemarie and com- 
pactly supported wavelets). In both cases, these are suprema of fixed random functions that 
depend only on known quantities. 

To construct the estimators, we first need a grid indexing the spaces Vj onto which we project 
P n . For r > 1, n > 1, choose integers j min := j min ,„ and j max := j max ,„ such that < j min < j max , 



2-'" 



l/(2r+l) 



and 2 Jn 



logn/ \^(logn) 2 / 
and set 

\J • \Jn [jmin; Jmax] 

The number of elements in this grid is of order logn. We will consider several preliminary 
estimators j'£, j„, j„ and j n of the resolution level, and we discuss the main differences among 
them below. Let p n (j) be as in © or ([7]). First, we set 



j e J : || PnU) - Pn(l)\\oo < T(n,j,l) + 7||$|| 2 |b„0 n 



Vl>j,leJ 



f n = min ijej:\\ p n (j)-Pn{l)\\ oc < (B{cf>)+l)R(n,l)+7\\$h\\p n (j ma , x )\\ 



(23) 

where the function $ is as in ©, and we discuss an explicit way to construct $ in Remark [5] 
below. If the minimum does not exist, we set j„ equal to j ma x- Further we define j n as the same 
minimum but with T(n,j,l) replaced by its Rademacher expectation E £ T(n, 
An alternative estimator of the resolution level is 

^ 2 ^VZ >j,lej 
(24)' 

where B{<j)) is a bound, uniform in j, for the operator norm in L°°(M) of the projection ttj, see 
Remark [3] below. Again, if the minimum does not exist, we set j £ equal to j max . We also define 
j n by replacing R(n, I) by E E R(n, I) in (f2"i]). 

Before we state the main result, we briefly discuss these procedures: The data-driven reso- 
lution levels j n and j n in pi)) are based on tests that use Rademacher- analogues of the usual 
thresholds in Lepski's method: Starting with j m i n , the main contribution to \\p n (j) — Pn(l)\\oo is 
the bias \\Ep n (j) — po\\oo- The procedure should stop when the 'variance term' ||f> n (Z) — Ep n {l)\\oo 
starts to dominate. Since this is an unknown quantity, and since we know no good nonrandom 
upper bound for it, we estimate it by the supremum of the associated Rademacher process, i.e., 
by R(n, I), or by its Rademacher expectation. The constant B{4>) is necessary to correct for the 
lack of monotonicity of the R(n, l)'s in the resolution level I. 

The estimators j n and j n in (|23[) are somewhat more refined, but also slightly more compli- 
cated: They try to take advantage of the fact that in the 'small bias' domain, 



\\Pn(j) -Pn(l)\\ 



n 



A',)(AV) 



i=l 
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should not exceed its Rademacher symmetrization 



T(n,j,l) 



1 n 



or its conditional expectation E E T(n, j,l), using the deviation inequality from Corollary[TJ 

Yet another way of viewing these resolution levels is in terms of model selection: One starts 
with the smallest model Vj min and compares it to a nested sequence of models {Vj}j£j, pro- 
ceeding to a larger model Vj if all relevant blocks of wavelet coefficients between j and j max are 
insignificant as compared to the corresponding Rademacher penalty. 

We now state the main result, whose proof is deferred to the next section. As usual, we say 
that a wavelet basis is s-regular, s € NU {0}, if either the father wavelet 4> has s weak derivatives 
contained in L P (R) for some p > 1, or if the mother wavelet ip satisfies J x a ip(x)dx = for 
a = 0, s. Note that any compactly supported element of C S (R), s > 0, is of bounded (1/s)- 
variation, so that the p- variation condition in the following theorem is satisfied, e.g., for all 
Daubechies-wavelets. The assumption of uniform continuity in the following theorem can be 
relaxed at the expense of sligthly modifying the definition of j n , see Remark [T] below. The 
estimators below achieve the optimal rate of convergence for estimating pq in sup-norm loss in 
the minimax sense (over Holder balls), cf., e.g., Korostelev and Nussbaum (1999) for optimality 
of these rates. 

Theorem 3 Let X\, ...,X n be i.i.d. on R with common law P that possesses a uniformly con- 
tinuous density pq. Let p„(j) :=p n {y,j) be as in where <f> is either compactly supported, of 
bounded p-variation (p < oo ) and (r — l)-regular, or <f> = <j) r equals a Battle- Lemarie wavelet. 
Let the sequence {j n } n &i be either {j£}„ e N, {jrJneN, {f n }nm or {j„}neN, and let F n (j n )(t) = 
f^ooPniyJtJdy. Then 



F 



G, 



(25) 



the convergence being uniform over the set of all probability measures P on R with densities po 
bounded by a fixed constant, in any distance that metrizes convergence in law. Furthermore, if 
C is any precompact subset of C(M), then 



sup Esup\p n (y, j n ) - p (v)\ = o(l) 
poec j/gR 



//, in addition, po £ C*(R) for some < t < r then also 



sup Eswp\p n (y,jn) - Po(y)\ = O 
Pa:\\Po\\t,™<D y£K 



logn 



t/(2t+l) x 



(26) 



(27) 



Remark 1 Relaxing the uniform continuity assumption. The assumption of uniform continuity 
of the density of F can be relaxed by modifying the definition of j n (or j n ) along the lines of Ginc 
and Nickl (2008): The idea is to constrain all candidate estimators to lie in a ball of size o(l/y/n) 
around the empirical distribution function F n so that |23|) holds automatically. Formally, this 
can be done by adding the requirement 



sup 



Pn(y,j)dy - F n (t) 



< 



/nlogn 



in each test in (j23j) or If this requirement does not even hold for j max , it can be seen as 

evidence that F has no density, and one just uses F n as the estimator, so as to obtain at least the 
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functional CLT. If F has a bounded density, one can use the exponential bound in Proposition 
[3] in the proof to control rejection probabilities of these test in the 'small bias' domain j n > j*, 
and Theorem [3] can then still be proved for this procedure, without any assumptions on F. See 
Gine and Nickl (2008) for more details on this procedure and its proof. 

Remark 2 The constant ||<&||2- Once the wavelet <f> have been chosen, j n is purely data driven 
since the function $ depends only on (j>. For the Haar basis (4> = I[o,i)) we can take = because 
in this case K(x, y) < Zro,i) — 2/|) so that ||$||2 = 1- A general way to obtain majorizing kernels 
<& is described in Section 8.6 of HKPT (1998) as follows: Let ^ be a non-increasing function in 
L 1 (R+) such that \<f>(u)\ < 4>(\u\). Then, 

\K(x, y)\ < C@)$(5\x - 2/|/2) =: <S>(\x - y\) 

where 6 < 1/4 is such that <f){5/2) > 0, and 



4>{5/2) 

For compactly supported <$>, a crude choice of <$> is ||0|]oo times the indicator of its support, but 
numerical methods might give much better estimates. For Battle-Lemarie wavelets, the spline 
representation of the projection kernel is again useful for estimating ||$||2- For example, if r = 2 
(linear splines), one writes as in Huang and Studden (1993) (cf. also Lemma Q] below) 

K(x, y) = J2 bkiNoA* - k)N , 2 (y - I) = c^T A^A^ ~ k)N oa {v - 0, 

k,l k,l 

with c = \/3 and A = —2 + Now since A^o,2 = l[o,i) * l[o,i)j the kernel K(x,y) is easily seen 
to be majorized in absolute value by $(\x — y\) where 3>(|u|) = IcA^"'^ 2 ^ , which gives the (not 
necessarily sharp) bound ||$||2 < 15.5. For higher order spline wavelets similar computations 
apply. Again, numerical methods might be preferable here. 

Remark 3 The constant B(<f>). To construct j n one requires knowledge of the constant B{<j>) 
that bounds the operator norm 11^11^ of ttj viewed as an operator L°°(R). A simple way of 
obtaining a bound is as follows: for any / S L°°(R) we have, by ([3]), 



that is, 



Kj(x,y)f(y)dy 



< w 



<ll*lllll/ll 



In combination with the previous remark, one readily obtains possible values for B(cf>). For 
instance, for the Haar wavelet, B{<f>) < 1. For spline wavelets, other methods are available. For 
example, for Battle-Lemarie wavelets arising from linear S-splines, Htt^H^ is bounded by 3, and 
Shadrin (2001, p. 135) conjectures the bound 2r — 1 for general order r. See DeVore and Lorentz 
(1996, Chapter 13.4), Shadrin (2001) and references therein for more information. 



4.1 Risk and Oracle-type inequalities for Haar wavelets 

Theorem [3] is asymptotic in nature, and a natural question is how large the constants in the 
convergence rate (I27|) are. One way to address this question is by comparing the risk of the 
adaptive estimator to the risk of optimal linear estimators that could be constructed if more 



13 



were known about pq. While our methods allows such comparisons, we should note in advance 
that the randomization techniques as well as the relatively simple model selection procedure 
employed here are not likely to produce optimal constants in these comparisons. Anyhow, the 
constants obtained here are much better than what would be possible using moment inequalities 
for empirical processes directly (as in (|12[) ). or any other method known to us. To reduce 
technicalities, we restrict ourselves here to the Haar wavelet <f> = fa, but all the results below 
could also be proved (with modified constants) for the wavelets considered in Theorem [3] 

We first compare to an 'oracle' that only knows that po <E C*(R), with a bound on its Holder 
norm. In this case, one could choose j*(t) so that the 'variance' term E\\p n {j) — Ep n (j)\\ OQ and 
the bias term \\Ep n (j) — polloo < B(j,po) balance, where B(j,po) is defined in (|45|) below. Here, 
several possibilities arise: For instance, since 



lim 



■sup\p n (y,j n )-Ep n {y,j n )\ = ^2 log 2||p || ^ 2 a.s. and in L?(P) (28) 



for the linear Haar- wavelet estimator (Theorem 2 in Gine and Nickl (2007)), a possible choice of 
j* is the resolution level that balances B(j,po) with ^/2 log 2||po||oo 2 \/2 : ' j/ri, see (|49p below. 

Proposition 3 Let the conditions of Theorem^ hold and let j* := j*(t) be as in {49\j . Then if 
Po € C*(R) for some < t < 1, if <fi — 4>i * s the Haar wavelet, and if j n is as in Theorem^ we 
have for every n, 



E\\Pn(j n ) PO 1 1 oo S 

30E\\p n (f 



Po\ 



o 



o 



logn 



2t/(2t+l) N 



The (proof of the) previous proposition and (|28p allow to obtain an explicit upper bound for 
the asymptotic constant in the risk of the adaptive Haar-wavelet estimator. Recall the definition 
oiH(s,f) from ©. 



Proposition 4 Let the conditions of Theorem^ hold. Then, if Pq G C*( 
and if <j) = 4>i is the Haar wavelet, we have 



for some < t < 1, 



lim sup 



logn 



t/(2*+l) 



E\\p n (jn) -Polloo < A(Po) 



where 



A(p ) = 26.6 



V21og2(l + t)' 



l/(2t+l) 



1/3 i i 

For example if t = 1, A(po) < 20||po||oo ||-Dpo||oo°- The best possible constant in the minimax 
risk is derived in Korostelev and Nussbaum (1999) for densities supported in [0,1], and our 
bound misses the one in Korostelev and Nussbaum (1999) by a factor less than 20. Some loss of 
efficiency in the asymptotic constant of any adaptive estimator is to be expected in our estimation 
problem, cf. Lepski (1992) and also Tsybakov (1998). 

The choice j* in Proposition [3] above is based on replacing the variance term E\\p n (j) — 
Ep n {j)\\oo by its limit, which might be suboptimal in finite samples. So, for better finite-sample 
performance, an oracle that knows po £ C*(M) would choose the resolution level j# so as to 
balance 

B(j,p ) and E\\p n (j) - Epnij)]]^. 
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A slight modification of the procedure allows to obtain a comparison of the risk of the 
adaptive estimator to the one of p n (J )• Let J be as in the previous section, and define j n by 

f l2 l l 1 

k = xmnijej: \\ p n (j) ~ P„(0IU < 5R(n, I) + 10||p n (j max )||^ 2 y — Vl>j,leJV. (29) 

Proposition 5 Suppose po is in C (R) for some < t < 1. Let <fi = <fii be the Haar wavelet, 
and let j n and j# be defined as in and 156]) . respectively. Then, for every n, 

^lbn(5n) - Police < 51E\\ Pn (j*) -polloc + O^ 2 ) +O I f ^ 

We should note that i?||p„(j#) — poll 00 = 0((n./logn) _ */( 2 * +1 )) can be shown to hold (using 
Lemma [7]), so that this estimator satisfies the conclusion of Theorem [3] (with r = 1) as well. 

The 'oracles' from Propositions [3] and [5] only use knowledge of the fact that po G C'(R). If po 
were known completely, one could still improve on y& by using directly \\Ep n (j) —poll 00 instead 
of its upper bound B(j,po). In fact, under complete knowledge of po, a " Haar-oraclc" would 
choose a resolution level j H that satisfies 

inf ' E\\ Pn (j) -polloo =E\\ Pn (j H ) -p \U (30) 

To mimic such an oracle in sup-norm loss is more difficult. Note first that the procedures j n used 
here are all implicitly based on estimating the unknown bias-bound B(j,po). The space C*(R) 
contains functions that are not contained in C t+d (R) for any 5, but still do not attain a critical 
Holder-singularity of order t at any point x G R. More precisely, let / S C*(R) for t £ (0, 1], set 

*(/.,. „..):= + 

M 

and define the pointwise Holder exponent 

t(f, x) — sup{t : H(f, x, v,t) < C for some C and v in a neighborhood of x}. 

Several things can happen: For example, if the exponent is not attained (so that / is not t(f, x)- 
Holder at x), the limit as |u| — > of H(f,x,v,t) equals for every t < t(f,x). Even if / 
is t(f, cc)-H61der at x, it can happen that H(f 1 x,v,t(f,x)) — * (for instance it could be of 
order l/log(l/w)). Furthermore, hmi^|_,Q H (/, x, v, t(f, x)) may fail to exist. We refer to Jaffard 
(1997a,b), where these phenomena are investigated in more detail, and where it is shown that 
this somewhat pathological behavior does not occur for a large class of 'self-similar' functions. 
As soon as the true density attains a critical Holder-singularity at one point, the sup-norm risk 
of the oracle estimator is driven by the risk at such a 'critical' point, and then we can prove 
an oracle inequality. We note that similar assumptions were necessary in the construction of 
adaptive pointwise confidence intervals, cf. Picard and Tribouley (2000). 

Let now either p e C 1 (R), or assume p G C'(R) for some < t < 1 but p C t+<5 (R) for 
any 5 > 0. For instance, if po G C*(R) and, for some x' G R, liminf,,_>o H (/, x', v, t)/u(l/h) > 
where x s oj(x) — » 00 as x — > 00 for every 5 > 0, this assumption is satisfied. Recall the definition 
of H (s, /) from (JXJ) . For x G R, let k(x) be the integer satisfying 2 l x — 1 < k{x) < 2 l x, and define 



H(t,po) xeR 



k(x)~2 l x + l 

{p (x + 2~ l u) - p (x))du 

k(x)-2>x 



Al. (31) 
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We show in the proof of the following proposition that W(l,po) > for all Z € N if po is a uni- 
formly continuous density. The (inverse of the) function W(l,po) measures the loss of adaptation 
compared to the oracle if the true density po does not attain a critical Holder singularity at any 
point. If po is 'self-similar' in the sense that 

sup|Afcbo)| >2-^ t+1 ^w(l) (32) 
k 

for some positive function w(l), one can use, e.g., (jB"3"|) to obtain a simple (not necessarily optimal) 
lower bound for W(l,po) of the form cw(l) where c > 0. For instance, if lim^_>o (po [x 1 + v) — 
Po(x')) /{v^sign^) = H' > B,t some x' £ R, then Condition (|32|> can be shown to hold with 
w(l) tending to a constant w > H'(\ — 2~')/(i + 1) as I — > oo. This implies that W(l,po) is 
bounded from below uniformly in I and converges to W > H'(l — 2~ t )/H(t,po). In particular, 
if Po e C X (R), then JV(Z,p ) -> W > 1/2. Note also that j H -> oo as n -> oo. 

Proposition 6 Letp n {j n ) be the estimator from Proposition^ and let j H be as in HS0\) . Suppose 
Po £ C 1 (M) or assume po € C*(R) for some < t < 1 but p C t+5 (M) /or any o" > 0. T/ien, /or 
evert/ n, 

S|K(in) - Polloo < wr , g , ^|bn(i g )-po||oo + 0(^ 1/2 ) + O ( — 

5 Proofs of the Main Results 

5.1 Projections onto spline spaces and their wavelet representation 

We briefly review in this section how the wavelet estimator © for Battle-Lemarie wavelets can 
be represented as a spline projection estimator ([7]). We shall need the spline representation in 
some proofs, while the wavelet representation will be useful in others. 

Let T := Tj = {ti}^^ = 2~ 3 Z, j € Z, be a bi-infinite sequence of equally spaced knots. 
A function S is a spline of order r, or of degree m = r — 1, if on each interval (t^ , t^+i) , it 
is a polynomial of degree less than or equal to m (and exactly of degree m on at least one 
interval), and, at each breakpoint U, S is at least m-times differentiable. The Schoenberg space 
S r (T) := S r (T,M) is defined as the set of all splines of order r, and it coincides with the space 
S r (T, 1,R) in DeVore and Lorentz (1993, p. 135). The space S r (Tj) has a Riesz-basis formed by 
S-splines {Nj t k,r}kez. that we now describe. [See Section 4.4 in Schumaker (1993) and p.l38f. 
in DeVore and Lorentz (1993) for more details.] Define 

N ,r(x) = l[ ,i)*... *l[ Qj i)(a;), r-times = ^ r/_ , — — . 

i=0 ^ '■ 

For r = 2, this is the linear B-spline (the usual 'hat' function), for r = 3 it is the quadratic, 
and for r = 4 it is the cubic B-spline. Set Nk, r (x) ■— Vo. r (x — k). Then the elements of the 
Riesz-basis are given by 

N jjkt r(x) := N Kr {Vx) = N , r {2?x - k) . 

By the Curry-Schoenberg theorem, any S G S r (Tj) can be uniquely represented as S(x) = 
Sfcez c kNj.k, r {x). The orthogonal projection tTj(f) of / S L 2 (R) onto S r (Tj) n L 2 (R) is derived, 
e.g., in DeVore and Lorentz (1993, p.401f.), where it is shown that Ttj(f) — 2^ 2 ^2 keZ CkNj t k, r 
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with the coefficients Ck := Ck(f) satisfying (Ac)k — 2 J / 2 J Nj i i c , r {x)f(x)dx where the matrix A is 
given by 

a u = I 2 ] N^ k ^ r {x)N^ ltr {x)dx — j Nk, r {x)Ni tT (x)dx. (33) 



The inverse A 1 of the matrix A exists (see Corollary 4.2 on p. 404 in DeVore and Lorentz (1993)), 
and if we denote its entries by bu so that c k = 2^ 2 j bkiNjj tr (x)f(x)dx, we have 

ij(/)(y)=2 3 / ^ ^ hlNj,i, r (x)Nj i k,r(y)f(x)dx = / Kj(x,y)f(x)dx, 

k I 

where Kj(x,y) = 2 J n(2 3 x, 2 J y) and where 

K{x ) y) = '^2^2b k ,Ni ir (x)N htr (y), (34) 

k I 

is the spline projection kernel. Note that k is symmetric in its arguments. 

The idea behind the Battle-Lemarie wavelets is to diagonalize the kernel k of the projection 
operator 7Tj or, what is the same, to construct an orthonormal basis for the space S r (Tj). This led 
in fact to one of the first examples of wavelets, see, e.g., p.21f. and Section 2.3 in Meyer (1992), 
Section 5.4 in Daubechies (1992), or Section 6.1 in HKPT (1998). There it is shown that there 
exists a r — 1-times diffcrcntiable father wavelet <p r with exponential decay, the Battle-Lemarie 
wavelet of order r, such that 

5 r (T,) n L 2 (R) = V jir = jy> fc 2 J / 2 <M2 J (-) - k) : ^> 2 < ooj . 

This necessarily implies that the kernels k and K — K(<f) r ) describe the same projections in 
L 2 (R) , and the following simple lemma shows that these kernels are in fact pointwise the same. 

Lemma 1 Let {Nk. r }kez be the Riesz-basis of B -splines of order r > 1, and let 4> r be the 
associated Battle-Lemarie father wavelet. If K is as in (0) and k is as in \3J$ , then, for all 
x, y G R ; we have 

K(x,y) = K(x,y). 

Proof. If r = 1, then iVo i = 4>x since this is just the Haar-basis. So consider r > 1. Since 
{4> r (- — k) : k £ Z} is an orthonormal basis of <S r (Z) n L 2 (R) (cf., e.g., Theorem 1 on p. 26 in 
Meyer (1992)), it follows that K and k are the kernels of the same L 2 -projection operator, and 
therefore, for all f,g € i 2 (R) 

(K(x, y) - k(x, y))f(x)g(y)dxdy = 0. 

By density in L 2 (M. x R) of linear combinations of products of elements of i 2 (R), this implies 
that k and K are almost everywhere equal in R 2 . We complete the proof by showing that both 
functions are continuous in M 2 . For K, this follows from the decomposition 

\K(x,y)-K(x',y')\ < £ |^(as-fc)-^(a;'-fc)||&.(y-fc)|+£; |^ r (y-A;)-^.(y / -fc)||^.(a/-fc)|, 

k k 

the uniform continuity of (f> r (r > 1) and relation For n we use the relation (|36p below, 
\k(x, y) - K{x',y')\ < \ N i,r( x ) ~ N i>r (x')\\H(y + £ \H(y - i) - H(y' - i)\\N i>r (J)\, 
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which implies continuity of k on R 2 since Nq iT and H are uniformly continuous (as Ng tr is and 
J2i < an d since -/Vo, r has compact support. ■ 

The fact that these kernels are pointwise the same allows to compute the estimator ^ for 
the Battle-Lemarie wavelets in terms of S-splines by the formula ([7]) . 

5.2 An Exponential inequality for the uniform deviations of the linear 
estimator 

To control the uniform deviations of the linear estimators from their means, one can use inequal- 
ities for the empirical process indexed by classes of functions T contained in 



together with suitable bounds on a. 

If K is a convolution kernel, then K, is contained in the set of dilations and translations of a 
fixed function K, and then JC is of VC-type (i.e., it satisfies (fTTjO if K is of bounded variation, a 
result due to Nolan and Pollard (1987). In fact, bounded variation can be replaced by bounded 
p- variation for p < oo (see Lemma 1 in Gine and Nickl (2007)) which allows also for a- Holder 
kernels, a > 0. 

If K = K(<f>) is a wavelet projection kernel as in ([2|), and if has compact support (and is 
of finite p- variation) , it is proved in Lemma 2 in Gine and Nickl (2007) that the class K, also 
satisfies the bound (TTTj) . However, the proof there does not apply to Battle-Lemarie wavelets. 
A different proof, using the Toeplitz- and band-limited structure of the spline projection kernel, 
still enables us to prove that these classes of functions are of Vapnik-Cervonenkis type. 

Lemma 2 Let K, be as in h35\). where 4> r is a Battle-Lemarie wavelet for some r > 1. Then 
there exist finite constants A > 2 and v > 2 such that 



for < e < 1 and where the supremum extends over all Borel probability measures on R. 

Proof. In the case r = 1, <j>i is just the Haar wavelet, in which case the results follows from 
Lemma 2 in Gine and Nickl (2007). Hence assume r > 2. 

The matrix A is Toeplitz since, by change of variables in (|33p . aki — afc+i.i+i for all k, I G Z, 
and it is band-limited because A*o,r has compact support. It follows that also A -1 is Toeplitz, 
and we denote its entries by bki — g(\k — l\)) for some function g. Furthermore it is known 
(e.g., Theorem 4.3 on p. 404 in DeVore and Lorentz (1993)) that the entries of the inverse of any 
positive definite band-limited matrix satisfy 



/C={2-^(-,y):yGR,jeNU{0}}, 



(35) 




\b k i\ <cAl fc -'l 



for some < A < 1 and c finite. Now, following Huang and Studden (1992), we write 



9(\l - k\)N k , r (x) = - k\)N k -i,r(x - 



X>(|k|)i\M*-l)> 



k k 



k 



so that 




(36) 
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where 



H(x) = Y,g(\k\)N k , r (x) 

fcgZ 



is a function of bounded variation: To see the last claim, note that iVo r is of bounded variation, 
and hence ||iVfc ir ||xv = ||-?Vo,r||TV (where || • \\tv denotes the usual total-variation norm) so that 



\H\\ TV <\\N 0>r \\ TV Y J \9(\k\)\<™ 



because ^2 k \bi t i—k\ < Sfc c ^' fe ' < 00 • The last fact implies that 

H = {H{2 ] {-) - 1) : f € Z, j € NU{0}} 
satisfies, for finite constants B > 1 and w > 1 

sup N{H,L 2 {Q),s) < ^ H J^ , for < e < 

as proved in Nolan and Pollard (1987). Since Nj t o r is zero if y is not contained in [0, 2~-V], the 
sum in (j3l)|) , for fixed y and j, extends only over the Vs such that 2^y — r < I < 2^y, hence consists 
of at most r terms. This implies that K, is contained in the set H. r of linear combinations of at 
most r functions from 7i, with coefficients bounded in absolute value by ||iy^,j,r||oc — ll-^Wlloo < 
oo. Given e, let e' — e/(2rmax(||i?|| 00 , || A^o,r||oo))- Let a>i,...,a ni be an e'-dense subset of 
[— ||JVb,r||oo) || iVo,T-II oo] which, for e' < ||iVo,T-||oo; has cardinality rii < 3||iVo,T-||oo/&'. Furthermore, 
let hi,..., h n2 be a subset of H of cardinality n 2 = N(Tt, L 2 (Q),e') which is e'-dense in H in the 
L 2 (<3)-metric. It follows that, for e' < min(||.H'|| 00 , \\N , r \\oo), every J2i e z N ],iAv) H ( 2J (0 - 
is at L 2 (Q)-distance at most e from a i(i)hi'(i) f° r some 1 < < n\ and 1 < < 

n 2 - The total number of such linear combinations is dominated by (n\n 2 ) r < [B' / eY w+1 ^ r . 
This shows that the lemma holds for e < 2rmin{||_ff|| oc , || A^rlloo} maxjlliJUoo, ||iVo )r ||oo} — 
2r||iJ|| 00 || A^o^Hoo = U, which completes the proof by taking A — max(£?', £/, e) (for e € [U, A] 
one ball covers the whole set). ■ 

This lemma implies the following result. 

Proposition 7 Let K be as in 0) and assume either that <f> has compact support and is of 
bounded p-variation (p < oo), or that (f> is a Battle- Lemarie father wavelet for some r > 1. 
Suppose P has a bounded density pq. Given C,T > 0, there exist finite positive constants C\ = 
Ci(C,K, llpolloo) andC 2 = C 2 (C, T, K, ||po|| oo) such that, if 

— >C and d\ — <t <T 



then 



Pr| S up|p n (2/,i) - E Pn (y,j)\ > tj < exp [~C 2 ^- ) C-!7) 



Proof. We first prove the Battle-Lcmarie wavelet case. If r > 1, the function K is continuous 
(see the proof of Lemma [I}, and therefore the supremum in (l3"T|) is over a countable set. That 
this is also true for r — 1 follows from Remark 1 in Gine and Nickl (2007). We apply Proposition 
[T] and Lemma [2] to the supremum of the empirical process indexed by the classes of functions 

JCj := {2-^(- ) j/)/(2||$|| 00 ):yeM}, 
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where $ is a function majorizing K (as in J3J), so that ICj is uniformly bounded by 1/2. We 
next bound the second moments E(2~ 2 i K?{X, y)). We have, using ([3]), 



2-^K 2 ] {x,y) Po {x)dx < I $ 2 (\2j(x-y)\)p (x)dx 

< 2-i [ <S> 2 (\u\)p Q (y + 2-iu)du< 2^||p || oo ||$||2. (38) 



Hence we may take a — •\/2~-'||$|||||po|| 00 /(2||$|| 00 ), and the result is then a direct consequence 
of Proposition [U which applies by Lemma O For compactly supported wavelets, the same proof 
applies, using Lemma 2 (and Remark 1) in Gine and Nickl (2007). ■ 

Using Proposition [IJ and by keeping track of the constants in the proof Lemma [2j one could 
also obtain explicit constants in inequality (137p . For applications in limit theorems unspecified 
constants suffice. 

Proof. (Theorem [l]) Using Lemma [2j the first two claims of the Theorem follow by the same 
proof as in Theorem 1 and Corollary 1, Gine and Nickl (2007). For the bias term, we have the 
following argument, ft is well known that if 4> is m-times differentiable with derivatives in L P (R) 
for some 1 < p < oo, then the projection kernel reproduces polynomials of degree less than or 
equal to m, that is, for every x and < a < to, 

K(x,y)y a dy = x a , 

cf., e.g., Theorem 8.2 in HKPT (1998). Recall from Section lSTTl that <fi r is r—1 times differentiable. 
If po € C*(R) then we can write, by Taylor expansion and the mean value theorem, 



\Ep n (x) - p (x)\ = 



K j{x,y){p a {y) - p (x))dy 



< I iKjfayWDMpoix + av ~ x)) - D^p (x)\\y - x\®dy 

< V J ^\V(y-x)\)\Dl% (x + ay-x))-D [t] Po(x)\\y-4 t] dy 

= 2- J 'M J mu\)\u\W\DWp (x + Z2-iu) - D^ Po (x)\du 

< 2^ t \\p Q \\^ 00 C, (39) 

for t noninteger, where C := C($) = J ^(\u\)\u\ l du. The proof of the same inequality for 
r > t G N is similar (in fact shorter), and omitted. ■ 



5.3 An exponential inequality for the distribution function of the linear 
estimator. 

The quantity of interest in this subsection is the distribution function of the linear projection 
estimator p n from ([7| , more precisely, we will study the stochastic process 

V^(F^(s) - F(s)) =VH [ (Pn(y,j)-Po(y))dy, s e M. 
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To prove a functional CLT for this process, it turns out that it is easier to compare to F n 
rather than to F. With T = {l(_oo,s] : s G R}, the decomposition 

(F n s - F n )(a) = {P n P)(^(/) - /) + j (^-(po) - Po )f, f G T, (40) 

will be useful, since it splits the quantity of interest into a deterministic 'bias' term and an 
empirical process. 

We first give a bound on the deterministic term. To show that J (7Tj(po) — Po)/ = 0(2 — ) 
is quite straightforward by the usual bias techniques, but to obtain meaningful results for the 
most interesting choices of j, the sharper bound 0(2 _J ( t+1 )) is crucial, and it can be obtained 
as follows. 

Lemma 3 Assume that po is a bounded function (t = 0), or that po € C'(R) for some < t < r. 
Let T = {l(-oo, s ] : s e E}. Then 



j 

■I l; 



(nj(po)-Po)f 



< C2-^ t+ ^ (41) 



for some constant C depending only on r and ||po||t,oo- 

Proof. If ip := tjj r is the mother wavelet associated to <j> r , we have, using that the wavelet series 
of po G i 1 (M) converges in L 1 (M), 

oo 

nj(po) ~Po = -^2^2f3ik(po)ipik, 

1=3 k 

in the L 1 (M)-sense. Therefore, since / = l(_ 00lS ] G L°°(R), we have 

- / (ttj(Po) ~Po)f = / [ y^y^ 0ikipo)i>ik(x) ) f(x)dx 

\l=j k I 



OO r. 

y^y^PikiPo) / f{x)ipi k {x)d2 



1=3 k 

OO 

= EE^(»)^(/)- ( 42 ) 

l=j k 

The lemma now follows from an estimate for the decay of the wavelet coefficients of po and /, 
namely the bounds 

sup V|A fc (/)| <c2- 1 ' 2 and sup |Afc(Po)| < c '2"^ +1 / 2 ) , (43) 
f e:F k k 

The first bound is proved as in the proof of Lemma 3 in Gine and Nickl (2007) , noting that the 
identity before equation (37) in that proof also holds for spline wavelets by their exponential 
decay property. The second bound follows from 

sup\(3 lk (p )\ < c"2- l / 2 \\K l+1 ( PQ ) - ^(po)||oo 
k 

< c"2- 1 ' 2 (\\KM-PoU + \\Ki+M -Polloo) < C '2-'/ 2 2- ( * 
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where we used (9.35) in HKPT (1998) for the first inequality and §§§ in the last. ■ 



To control the fluctuations of the stochastic term, one applies Talagrand's inequality to the 
empirical process indexed by the 'shrinking' classes of functions {Ttj(f) — f : / G J 7 }- These 
classes consist of differences of elements in T and in 

K 'i'=\j K j (;y)dy:teRY 

and we have to show that, for each j, this class satisfies the entropy condition (fTTj) . Again, for 
4> with compact support (and of finite p- variation) , this result was proved in Lemma 2 in Gine 
and Nickl (2007), but we have to extend it now to the Battle-Lemarie wavelets considered here. 

Lemma 4 Let KJ^ be as above where 4> r is a Battle-Lemarie wavelet for r > 1 . Then there exist 
finite constants A > e and v > 2 and independent of j such that 

sup N(IC' p L 2 (Q), e) < (jj , < e < 1, 

where the supremum extends over all Borel probability measures on M. 
Proof. In analogy to the proof of Lemma [21 one can write 

f K 3 {-,y)dy = Y, I 2 j N j:l , r (y)dyH(2 j (-)-l), 

since the series (|36j) converges absolutely (in view of 

E \h{vx - oi < E E N *Av* - o < i^o.riioc E i3(i fc i)i < 

I k I k 

Recall that Njj^ is supported in the interval [2 _3 7, 2 _J (r + /)]. Hence, if I > 2H, the last integral 
is zero. For I < 2H~r, the integral equals the constant c = J R N 0yr (y)dy, and for I 6 [2H — r, 2H], 
the integral Cjy, r is bounded by c, so that this sum in fact equals 

c E H(2 j (-)~l)+ E c j>l>r H(2^.)-l). 

l<2H-r 2H~r<K2H 

The second sum is contained in the set H. r from the proof of Lemma[5J which satisfies the required 
entropy bound independent of j. For the first sum, decompose H into its positive and negative 
part, so that the two resulting collections of functions are linearly ordered (in t) by inclusion, and 
hence are VC-subgraph of index 1, see Theorems 4.2.6 and 4.8.1 in Dudley (1999). Moreover, we 
can take the envelope r||JVQ r ||oo J\ Iffd^DI independent of j. Combining entropy bounds, this 
proves the lemma. ■ 

Combining these observations, one can prove the following inequality, which implies the 
central limit theorem in Theorem [5] This proposition parallels Theorem 1 in Cine and Nickl 
(2008) for the the classical kernel density estimator, and Lemma 4 in Gine and Nickl (2007) for 
the wavelet density estimator (with (f> compactly supported). 
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Proposition 8 Let F n (s) = Jl^dPn and F r f (s) := F r f(s,j) = j b _ oo p n {y,j)dy, where p n is as 
in 0. Assume that the density po of P is a bounded function (t = 0) or thatpo G C*(M) for some 
t, < t < r. Let j € Z satisfy 2~ J > (logn/n). Then there exist finite positive constants L := 
L(\\po\\oo, K), A := Ao(||p |koo, K) such that for alln gN and X > A max(- v /j2-J, v^ - ^^ 1 )) 
we Ziaue _ 

/ r-i, 9 „ o f min(2^A 2 , a/tiA) 1 
Pr (V^H-F^ - -F«||oo > A) <iexpj 1 - V 

Proof. Given the preceding lemmas, the proposition follows from Talagrand's inequality applied 
to the class {^j(l(-oo,x]) — 1(-oo.k]} m the same way as in the proof of Lemma 4 in Gine and 
Nickl (2007), so we omit it. ■ 



5.4 Proof of Theorem [3] 

We can now prove the main result Theorem[3l We will prove it only for Battle-Lemarie wavelets. 
For compactly supported wavelets, the proof is exactly the same, replacing the results from steps 
I)-II) below and from Sections 15.21 and 15.31 for spline wavelets by the corresponding ones for 
compactly supported wavelets obtained in Gine and Nickl (2007). Also, uniformity in po ~~ which 
is proved by controlling the respective constants - is left implicit in the derivations. We start 
with some preliminary observations. 

I) Since, uniformly in j G 3 ', we have n/(2 J j) > clogn for some c > independent of n, we 
have from Proposition [71 integrating tail probabilities, that 

E\\ Pn (j) ~ Ep„(j)\\lo < D p {^A Pl := D^(j,n) (44) 

for every j € 3 , 1 < p < oo and some < D < oo depending only on ||po||oo and $. 
For the bias, we recall from (|3"9")1 that, for < t < r 



\Ep n (y,j)-p (y)\ < 2-'<|MkooC($) := B(j,p ). (45) 

If the density pq is only uniformly continuous, then one still has from ([3]) and intcgrability of $ 
that, uniformly in y G K, 



\Ep n (y,j) -po(y)\ < 



m\u\)\\p (y-2-j U )-p (y)\du 



B(j,Po) = o(l). (46) 



II) Define M := M n = CUp^]^)^ and set C = 49||$|||. Define also M = C||p ||oo for the 
same C. We need to control the probability that M > 1.01M or M < 0.99M if po is uniformly 
continuous. For some < L < oo and n large enough we have 

Pr(|M-M| > O.OICIMU) 

= Pr (|||p n (j roax )||oo - lbo||oo| > 0.01||p ||oo) 

< Pr(||p„(j max ) -polloo > 0.01||po||oo) 

< Pr(||p n (j max ) - £j?„0max)||oo > 0.01 1| Pol |oo £(jmax,Po)) 

< Pr (||Pn.(jmax) - #Pn(jmax)||oo > .009 1 |p 1 1 oo ) 

/ (logn) 2 ) 
< ex p| _ | 
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by Proposition [7] and Step I). Furthermore, there exists a constant V such that EM < L' for 
every n in view of 

E\\p n (j ma x)\\oo < E\\p n (j mlix ) - £p„(j max )||oo + ||-BPn(jmax)||oo < C+ || $|| 1 II °o , 

where we have used ((U and (f44|) . 

Ill) We need some observations on the Rademacher processes used in the definition of j n . 
First, for the symmetrized empirical measure P n — 2n~ 1 X)"=i s i^Xa we have 

R(n,j) = IMP^IU = K(7n(P„))||oc < hj\LR(n,l) < B{<f>)R(n,l) (47) 

for every / > j: Here 11^11^ is the operator norm in L°°(R) of the projection nj, which admits 
bounds B((f>) independent of j. (Clearly, Wj acts on finite signed measures \i by duality, taking 
values in L°°(R) since = \ J K j {-,y)dn(y)\ < 2- J '||i>|| 00 |/x|(R).) See Remark[3]for details on 

how to obtain B (</>). Integrating the last chain of inequalities establishes (|4"7f also for E £ R(n,j). 
Furthermore, for j < I, 

T(n,j,l) < R(n,j) + R(n,l) < (1 + B (</>)) R(n, I), (48) 

and the same inequality holds for the Rademacher expectations of T(n,j,l). We also record 
the following bound for the (full) expectation of R(n,l), I € J: Using inequality (f2~Tj) and the 
variance computation (|38[) . we have that there exists a constant L depending only on ||po||oo and 
$ such that, for every I e J, 



ER(n,l) < L^2 l l/n. 
Proof of ([25]). Let T = {l(_oo,s] : s G R}, and let / e T. We have 

Vn / (PnOn) -Po)f = Vn (PnUmax) ~Po)f + V™ / (Pn(jn) -Pn(jmax))/- 



The first term satisfies the CLT from Theorem [2] for the linear estimator with j n = j max . We 
now show that the second term converges to zero in probability. Observe first 

jmax-l 

Pn{jn)(y) -Pn(jmax)(y) = Pn(Kj n (;y) - K jmBX (-,y)) = - ^ AfcVta (j/), 

l=Jn k 

with convergence in L 1 (R). Next, we have by (9.35) in HKPT (1998), for all I € jmax - 1] 
and all k, by definition of j n , that for some < D' < oo 

{1/D')2 l ' 2 \p lk \ < S up\P n (K l+1 (-,y)) - P n (Kt(;y))\ = M + 1)-Pn(0lloo 

< \\p n (l + 1) -p n {jn)\\oo + \\Pn(l) ~ PnOn)\\oo 



< (1 + B(<f)))(R(n, l + l) + R(n, I)) + 3^ M2 l l/n 
in case j n = j n or j n — j e n using also the inequality T(n,j n , Z)<(1 + B(<fi))R(n, I) for I > j n (see 
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). Consequently, uniformly in / G 



E 



(Pn(jn) -Pn(jmax))/ 



= E 



E ^(y)f(y)dy 



<£ E ^'2^ /2 ((S(0) + l)(i?(n^ + l) + i?KO) + 3VA/2'V")El^(/)l 

■)// \ J" 



< 



using the moment bounds in II), III), j„ > j m i n — > oo as n — * oo (by definition of S) and since 
supj 6:F ^ fe |Afe(/)l < c2~ 1 / 2 by (|43j) for some constant c. 

Proof of (HU) and ([27]): 

The proof of the case t = follows from a simple modification of the arguments below as in 
Theorem 2 in Gine and Nickl (2008), so we omit it. [In this case, one defines j* as j ma , x if t = 
so that only the case j n < j* has to be considered.] 

For t > 0, define j* := j(po) by the balance equation 



j = mm 



{jej: B(j,p ) < v^Ioi2||po||^ 2 ||<f||2a(j,7i)} 



(49) 



Using the results from I), it is easily verified that 2 J ~ (nj logn)) 2t + 1 if po € C*(M) for some 
< t < r, and that 



a(j*,n)=0 



logn 



t/(2*+l) N 



is the rate of convergence required in ([2"T|) . 

We will consider the cases {j n < j*} and {j n > j*} separately. First, if j n is j n , then we 
have 



E \\Pn(Jn) Polloc ^{j„<j*}n{iv7<1.01M} 

< S (lbn(Jn) -Pn(i*)||c>o + S|bn(i*) -Po|U)%„< i .}n{M<l.oiM} 

< (B(0) + l)ER(n,f) + VlMMa(j*,n) + \\p n (f) - p ||oo 

5'V(j* ,n) 



(50) 



= 0(a(j*,n)), 

by the definition of j n , PS]) , the definitions of M and j'*, (|4"4"|) and the moment bound in III), 
and likewise if j n = j e n . If j n is j n or j^, then one has the same bound (without even using 
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Also, by the results in I), II), 
E p>n(jn) -Po 



%„<j*}n{M>1.01M} 



< E [[\\Pn(j)-Ep n (j)\\ oo + B(j,p )]I { - n=j} I { fy >101M} 



< cAogn [Da(j*,n) + B(j min) p Q )] ■ JE1 



L {M>1.01M} 



^{3™ >J* }n{ M <0.99M} 



We now turn to {j n > j*}. First, 

^||pn(jn) - PO 

< Y - B ([b«0')"^«0')l|oo+S(j,p )]% ri=j} / { M <0 .99Af} 

< c'logn[Da(j max ,n) + B(j*,p )} ■ ^EI{m <0 . 99 m} 

= 0^(logn)exp|-ii^|^ = (c7(r,n)), 

again by the results in I), II), and second, for any 1 < p < oo, 1/p + l/q = 1, using (|^|) and the 
definition of j* 



>j* }n{0.99M<M} 



E \\Pn(jn) - PO 

< Y ( E \\Pn(3)-P0\L) 1/P (£%„= i} n { 0.99M<M } ) 

< ]T D '<^ ") ' Pr (On = j} n {0.99M < M}) 1/9 . 



1/'/ 



We show below that for n large enough, some constant c, some S > and some 5 > 1, 

Pr({in = j} n {0.99 M < M}) < c2-^ q ' 2+s \ (51) 

which gives the bound 

]T D"a(j,n) ■ 2-*/*-*"* = O (^=\ = o(a(f,n)), 

completing the proof, modulo verification of (I5T]) . 

To verify (j5Tj) . we split the proof into two cases. Pick any j 6 J so that j > j* and denote 
by j~ the previous element in the grid (i.e. = j — 1). 

Case I, j n = j n or j n = j e n : 

We give the proof for j n only, as the proof for j n is the same given Corollary [T] One has 
Pv({f n =j}n{0.99M < Af}) < ]T Pr (H MT) "MOIL > r(n,r,0 + Voi99Mtr(I,n)) . 

l£j:l>j 



2G 



We first observe that 

||Pn(j")-P»(0lL ^ II Pn{r)-Pn{l)-E Pn (j-)+Ep n (l)\\^ + B{j-,po) + B(l,po), (52) 



where, setting V21og2||p !|^ 2 ||*||2 =:U(p ,$), 

B{j-,p ) + B(l, Po ) < 2B(j*, P o) < 2U( Po ,$)a(j*,n) < 2U{p Q ,$)a{l,n) 

by definition of j* and since I > j~ > j*. Consequently, the Z-th probability in the last sum is 
bounded by 



Pr (\\Pn(r)-Pn(l) - Ep n (j-) + Ep^l)^ > T(n,j-,l) + (V0.99M- 2U(p ,$))a{l,n)) 

and we now apply Corollary [T] to this bound. Define the class of functions 
^:=^-,-{2- ; (^-(-, 2 ;)-^(-,y))/(4||<i>||oo)}, 



(53) 



which is uniformly bounded by 1/2, and satisfies pip for some A and v independent of I and j 
by Lemma [2] (and a simple computation on covering numbers). We compute <r, using (|38[) and 
l>j-: 



2- M E{K j --K l ){X,y)f < 2^< +1 [EK* (X, y) + EKf(X, y) 



< 2- 2 '+ 1 ||<D||2||p || oo (2-' +2 1 ) < 3 - 2 £ || ^>|| ^ ||^o I 



-ill*ll2| 



so that we can take 



a 2 = 3-2 



Then the probability in (|53p is equal to 



Pr 



2'4||$|| 



J2f(x t )-p.f 



> 2^m^ 2 



i=l 



(V0.99M - 2U(p , *))<r(Z, n) 



T 



Pr 



> 2 



,n(V0.99M - 2U(p , *))o-(Z, n) 
' 3 - 2' - 411*1100 



Since no 2 j log(l / a) ~ n/(2 l l) — + oo uniformly in £ 6 J7, there exists A n — * oo independent of I 
such that (1131) is satisfied, and the choice 



t = 



n(V0.99M - 2U( Po , $))cr(Z, n) 
3~ 2^4p|U 



is admissible in Corollary [T] for C2(A„) = 1 + 120A„ 1 + 10800A„ 2 . Hence, using Corollary[U the 
last probability is bounded by 



<2p ( n^V0MM~2U(p ,$)f(2 l l/n)16\\<Z>\\l o \ l((q/2)+s) 
~ P l 9-6.3-C2(A n )2 2/ n2- i ||$||2||p || oo 16||$||^ o J ~ 



(54) 



for some 8 > and q > 1, by definition of M. Since J2i e j-.i>j 2 ^ {q/2)+S) ^ c2-^/^ +5 \ we 
have proved (jSTj) . 
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Case II, j n = j n or j n = jp 

We again only prove j n — j„ , in which case one has 
Pr({f n = j} n {0.99M < M» 

< Pv (\\pn(D-pn(l)\\ 00 > (B^) + l)R(n,l) + V0^9Ma(l,n) 

< Pr(||p„(r)-Pn(0|L>^(n,r,0 + ^/o^9Ma(^ ) n)), 

l£j:i>3 

by inequality (|48p . The proof now reduces to the previous case. 
5.5 Proofs for Section 14.11 

The proofs will use three technical lemmas that we give at the end of the section. 

Proofs of Propositions [3] and |4j 

We have that 



E 



Pn(jn) ~Po 



< 8E\\ Pn (f) - £2>„(j*)||oc + VlMMa(j*,n) + E\\p n (f) - p |U + 0(1/V») 

< 8E\\ Pn (f) - Ep n (j*)\\oc + VTMMS(p )- x E\\p n (j*) - Ep.^f)^ + E\\p n (f) - Po ||oo 
+ (2^°% ))/V^ + 0(1/Vn) + O ((n/logn) 2t/(2t+1) ) 

= 30^11^0"*) - Polloo + 0(1/VH) + O ((n/logn) 2t/(2t+1) ) . 

The first inequality follows from collecting the bounds from the proof of Theorem[3J (in particular 
((5TJ|0 and desvmmetrization (TlTf (using also \\Kj(po) Hoc < |ji?o II oo) - The second inequality follows 
from 



-^-(^-^(Po)] + V>'Cpo)]) ^ y + S (Po) ^IbnO ) ~ ^PnO )lloo + 

and Lemma[8]below, using also that 2 J ~ (n/ logn)) 2t + 1 . The last identity follows from Remark 
E] (noting that p n (l) - Ep n (l) = TTi(p n (l) -po))- 

This already proves Proposition [31 and Proposition [5] follows from the first inequality in the 
last display, the law of the logarithm for the Haar wavelet density estimator (T25J) and from the def- 
inition of j*, after some computations, involving an upper bound for (nj logn)*^ 2 * +1 -' \/2J* j* jn. 

Proofs of Propositions \E\ and [6} 

We set shorthand 

E(l) : =£|b„(0-£Pn(0Hoc 

and note that E(l) < E(j) holds for I < j in view of Pn {l) ~ Ep n (l) — TTi( Pn {j) — Ep n (j)) and 
Remark [3J Also, in the case of the Haar wavelet we can take in fact 

B(l,p ):=2- lt H(t,po)/(t + l), (55) 

in the bound (|45|) . Define j& := j^{ P o,n) by 

j* = axgmhx leJ m.ax(E(l), B(l,p )) ■ (56) 



2s 



Since B(1,pq) decreases and E(l) is nondecreasing as I increases, we have that j# exists (and if 
the minimizer is not unique we take the smallest one) and 



B(l,p )<E(l) = E\\p n (l)-Ep n (l)\\ 00 for all I > j*. 



(57) 



To see the the latter, suppose to the contrary that E(l) < B(1,pq) for some / > Then, since 
B(l,po) < B(j#,po) by strict monotonicity, I is a point where max.(B(l,po), E(l)) = B(1, P q) < 
B(j#,po) < max(B(j# ,po), E(j#)), a contradiction. We also note that by Lemma [7] below one 
has 2-?' # fr°> n ) ~ (n/logn) 1 /^ 1 ). 

Define M — 10 2 ||p n (j max )|| oo and M = 10 2 ||f o I! oo ■ We note in advance that, as in the proof 
of Theorem O 



E 

as well as 



Pn(jn) -P0 



I 



{j„<?#}n{M>i.oiM} 



= O (logn)Wexp 



(logn) 2 



E 



PnOn) -P0 



{jn>j*}n{M<0.99M} 



O (logn)Wexp 



(logn) 2 



for every (3 > 0. Hence it remains to consider the cases Iq <j#j n {M<i oiM} 
First we have 



and I 



= 0(n~' 3 ) 

{j n >j#}n{M>0.99M}- 



E 



Pn0n)-P0 



I 



{j n <j*}n{M<i.oiM} 



< E[ \\Pn(jn) - Pn(j*)\\oo + \\Pn(j*) ~ Po\\oo ) <j# }n { M < 1.01M} 



< 5ER(n,j*) + i/l.01M^-^ + E\\p n (j#) - Po \\ 



< 20E\\ Pn (j#) - Bp n (7' # )lloo + JlMM^^- + E\\p n (j*) -Polloo + 10|l ^ lo ° 

by the definition of j n , and desymmetrization (fT7|) (using also ||ifj(po)||ac < ||Po ||oo)- Second, 
using |44|) and ([57]) . and with 1/p + 1/q = 1, q > 1 arbitrary 



E \\Pn(jn) -P0 



I 



{jn>j#}n{0.99M<M} 



< 



£ 2(E\\p n (j)-Ep n (j)\\lf p (EI {h=3}n{0 



< 2-Dy^-Pr({j n =j}n{0.99M<M} 



.99M<M} 
1/9 



1/'-/ 



We show in Lemma [5] below that for n large enough, some q > 1 and some constant c' 

Pr({3™ = j} n {0.99M < M}) < c '2-^ q ' 2+s \ 
for some 8 > 0, which gives the bound 



(58) 



"■ I ""•> . 2 -i( 1 /2+5/<2) = o 
n 
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Combining these bounds we have established 
E 



p n (k)-Po < 20E\\p n (j*)-Ep n (j*)\\ x + JimM^^+E\\ Pn (j*y PQ \\ oo +O (4= 
oo V n \v n 

(59) 

Let next l{po) be as in Lemma [SJ then 



CV* logn 



\[^ih*<iip )\+h*>i^)]) < \]^^+s(po)- 1 E\\p n ( j *)-E Pn u*)\\ 

so that ([5T)|) becomes 

where we have also used Lemma [71 This completes the proof of Proposition [3 using p n (l) — 
Ep n (l) = 7Ti(p n (l) — po) and Remark[31 after computing the constant. 

We now prove Proposition^ Let j H be the resolution level of the oracle. Then we have from 
Lemma El and by definition of j&, 

E\\Pn(j H ) ~ PolU > W(j H ,p )ma X (E(j H ),B(j H ,p )) 
> W(j H , Po )m^(E(j*),B(j#, PQ )) 

which, noting that 

E\\ Pn (j*) ~ Ep^j*)^ = E(j#) < m^(E(j#),B(j#, Po )) 
as well as, using again Lemma El 

E\\ Pn (j*) - Polloo < 2m^(E(j#),B(j# lP o)) 1 

gives, by 

2t/(2t+l)\ 



E 



Pn(jn) ~P0 



< 

oo 



22 + VTMM/S( Po ) _.. , _/ 1 \ _/7logn 



completing the proof by definition of M, given the following lemmas. 

Lemma 5 Let j„ a^rf J defined as in i29\) and i50j) . respectively. Then, {or every j > y 
and n large enough (independent of j), we have that i58\) holds. 

Proof. Pick any j £ J so that j > j# and denote by j~ the previous element in the grid (i.e. 
j~ =3 - !)■ Tnen 



Pr({i = j} n {0.99M < M\) < ]T Pr ( || p n (D -p„(0|L > 5i? ( n '0 + V " 99 ^ 

iej-.l>j 



30 



We first observe that 

IkCT) - MOIL ^ II Pn(D-Pn(l) - E Pn {j-) + £^„(0L + B(j-, P0 ) + B(l 7 p ) 7 (61) 

where 

B(j-,p ) = 2 t B(j- +l,p )<2E(j- + 1)<2E(1), and B(l,p ) < E(l) 

by ([ST]) , since j~ + 1 = j > j# and since I > j > Consequently, the Z-th probability in the 
last sum is bounded by 



Pr \\ Pn (r) - Pn (l) - E Pn (r) + Ep n (l) I > 5R(n,l) ~ 3E{1) + \ /0.99M 



2^ 



< Pr \\pn{r) - Pn {l) - E Pn {j-) + Ep n (l)\\ > 2R(n,l) + (1 - a)\ 0.99M 



,2H 



+ Pr ( :-!/?(//./) m>)~ <« \/0.99M— j = A + B. 



The term A is dominated by 



Pr Pn(D-Pn(l) - E Pn (r) + E Pn (l) I > T(n,j-,l) + (1 - aW0.99M 



in view of l|48p and Remark [21 and we apply Corollary [T] to this probability. Arguing as in the 
bound ([54| for ([53]) . this probability is bounded by 



exp 



(1 - a) 2 99 ) < 2 _ l(a/2+s) 



9-6.3ci(A„) 



for some S > and q > 1 if a = 0.5359. 

Next, by symmetrization (|17p . the B-term is less than or equal to 



Pr j R(n,l) < ER(n,l)-^Jo.99M^- 



Pr 



n 



2 l 

< —E 

n 



a / 2H 



where T := Ti = {2 l (Ki(-,y)/2,y S R} . Applying (f20|) . the variance computation (|38|) (and 
choosing C2(A„) — * oo as in Corollary [T]) we have that this probability is bounded by 



exp 



a 2 ■ 99 



36-2.1ci(A„) 



< 2 -;(<?/2+5)_ 



Finally, since J2i e j-i>] c2- l{q / 2+ ^ < d2-^ q / 2+5 \ we have proved the lemma. ■ 

Lemma 6 Let W(l,po) be as in \31\) and let B(l,p$) be as in $55\). Let the conditions of 
Proposition^ be satisfied. Then W{l,po) > for every I £ N. Furthermore, we have for every 
LneN that 



W(l,po)max(E(l),B(l,po)) < E\\p n (l) - p ||oo < 2 max(£(i), B(l,p )). 
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Proof. We first make the general observation that 



\\Ep n (l)-p \ 



sup 



K(2 l x, 2 l x + u)(p Q (x + 2 l u) - p (x))du 



(62) 



< 2"" sup 



k(x)-2'-+l| (a; + 2 -l u) _ (a:) | 

— \u\ du < B(l,p ), 



i£K J k(x)-2 l x 



2- l u 



l, At 



where k(x) is as before (f31j) . To prove the first claim of the lemma, assume W(l,po) < 1 in which 
case 

H(t,p ) 



W(l, Po ) 



(t + 1)2'* 



\Ep n (l)-p \ 



For any bounded /, 



sup sup 2 

l k 



f(x)ip(2 l x - k)dx 



sup sup 

l k 



f 



u + k 



ip{u)du 



< ll/lloo, 



which, applied to / = Ep n (l) — po and since po is uniformly continuous (so that its wavelet series 
converges uniformly), gives 



\Ep n (l) -police > supsup2 m/2 |/? mfe (p )|. 

rri>l k 



(63) 



Suppose |/3 m fc(po)| = for all m > I and all k. Then po € VJ_i is a piecewise constant function, 
which is impossible since po is a uniformly continuous density. We conclude that W(l,po) > 
for all I € N. 

For the upper bound, note that 

E\\p n (l) -po|U < £||Pn(0 - ^Pn(OII~ + ||£p»(0 - Police < 2 max(E {I) , B {I , p Q )) , 



by the inequality below (|62j) . 

For the lower bound, we have p n (0 — Ep n (l) — TTi(p n (l) — po) so that (cf. Remark[3]) 

E(l) = E\\p n (l) - £p„(OI|oc < E\\p n (l) -polloo- 

Second, we have by Jensen's inequality 

-E||Pn(0 - Polloo = E\\p n {l) - Ep n (l) +Ep n (l) -p ||oo > \\Ep n (l) -po||oo, 

so that E\\p n (l) — po||oo > W(l,po)B(l,p ) by ([6*2"]) and the definition of B(l,p ). Combining 
bounds, this completes the proof. ■ 



Lemma 7 We have, for all n large enough that 



2 J 



\ogn 



l/(2t+l) 



so that in particular 



^n/logn)-*/^ 1 )) < J?—H < C(n/logn)-*/( 2t+1 ' 
V n 



for some constants < c < C < oo. 
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Proof. In view of Lemma [SJ (PHI) and definition of J , we have, for appropriate constants, 



dM < E(j) < dJ 2 -^ 

V n V n 

as soon as j m i n > /(po); and the result follows from the definition of j# and of B(j,po). 
Lemma 8 There exists l(po) finite such that for every I > l(po) and every n we have 

E\\p n (l) - ^(OIU > S(p )\ — - ^- 

V n n 

where 



S( P o) = 

and where < c" < oo does not depend on n or I. 



\Po\ 



47T log 2 



Proof. Define l(po) as the smallest integer / for which the following four conditions hold: 

a) If U(po) is the largest interval in {x G K : po{x) > ||po||oo/2} and if |f/(po)| denotes its 
length, then let I be such that |E/(po)| > 2/2'. Note that such an / always exists by uniform 
continuity of po- 

b) IMloo < (1 - t)2 1 where r = 51/50. 

c) /(log2-0.68) > -log(|Z7(po)|/2). 



d) 



/ 0.681 
2 log 2 



- V2 > 



/ 0.51/ 
2 log 2 ' 



We now prove the lemma. We start with a reduction to Gaussian processes. By Theorem 3 in 
Komlos, Major and Tusnady (1975) and by integrating tail probabilities, there exists a sequence 
of Brownian bridges B n such that 



E\\F n 



- 1/2 B n o Fll 



0(logn/n), 



where we have used that F is a continuous distribution function. Note next that for each y € 
there exists k := k(y) such that 

1 - 

Pn(l,y) - Ep n (l,y) = 2 l -2^(l[fc/2«,(fc+l)/2')(-^<) _ E1 [k/2K(k+l)/2i)(X)) 

i=l 

= 2 l [(F n - F)((k + l)/2 1 -) - (F n - F)(k/2 1 -)}. 

Consequently, for 

G n (k) := G n ,i(k) = B n o F((k + l)/2') - B n o F(k/2 l ), 
and G n (y) = G n (k(y)), we have 



E 



Pn(l) — Ep n (l) —G, 



< 2 l+l E\\F n -F-n- l ' 2 B r ,oF\\ 



O 



2 l log n x 
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Therefore 

E\\ Pn (l) - Ep n (l)\\oo > -^Esup \G n (k)\ ^ 

for some c" finite independent of I and n. We now lower-bound the Gaussian expectation, and 
we write shorthand P k = P(X G (fc/2 ( , (k + l)/2 1 }). We see that, for any fc, k' G Z, 

E(G n (k) - G n (k')) 2 = P k + P k , - (P k - P' k f > r(P k + P' k ) 

if P kl P k ' are both less than or equal to 1 — t, < t < 1, which happens by b) in the definition of 
l(po). Consider the Gaussian process G(k) = \frP k ■ g k where the g k s are i.i.d. standard normal. 
Then, by the above inequality, E(G(k) - G(fc')) 2 < E(G n (k) - G n (k')) 2 , and by Gaussian 
comparison (Theorem 3.2.5 together with Example 3.2.7b in Fernique (1997)) we have 



Esup \G„(k)\ > E sup G n {k) > E sup G{k) > £sup|G(fc)| - ^2/ir sup J EG(k) 2 

keA k£A k£A k£A keA 

for any ACZ. 

Let now U(po) be the interval from a). By hypothesis, |i7(po)| > 2/2', and therefore 
card{fc G Z : k/2 l G U(p Q )} > 2 / - 1 |[/(p )| > 1. 
Then, taking A — 2 l U(po) n Z in the Gaussian comparison inequality, we conclude 



2t|M 



Eswp\G n (k)\ > E max WrP k \g k \)- .. 

fcez fce2<t/( P0 )nz v V 7r2' 

Furthermore, by Fernique (1997, p. 27, expression 1.7.1), 

E max (VrPfclffkl) > min \JrP k E max |# fe | 

fce2 ! l/(po) k£2'U(p ) k£2 l U( Po ) 



/ t-HpoIIoo log(2'-i|^(po)|) / rllpoll^O.68? 
- V 2'+% log 2 - V 2'+ 1 7rlog2 

again by condition c) in the definition of l(po)- Hence, by the last condition on l(po), we have 



/r0.51|bo||oo I 



27rlog2 V 2 l 



E sup \G n (k)\ > 

k£Z 

and since r = 0.5/0.51, 



2-k log 2 V n 



which completes the proof. 
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